Speaker normalized acoustic modeling based on 3-D Viterbi decoding
نویسندگان
چکیده
This paper describes a novel method for speaker normalization based on a frequency warping approach to reduce variations due to speaker-induced factors such as the vocal tract length. In our approach, a speaker normalized acoustic model is trained using time-varying (i.e., state, phoneme or word dependent) warping factors, while in the conventional approaches, the frequency warping factor is xed for each speaker. These time-varying frequency warping factors are determined by a 3-dimensional (i.e., input frames, HMM states and warping factors) Viterbi decoding procedure. Experimental results on Japanese spontaneous speech recognition show that the proposed method yields a 9.7 % improvement in speech recognition accuracy compared to the conventional speaker-independent model.
منابع مشابه
The dynamically-adjustable histogram pruning method for embedded voice dialing
Memory and speed are two key factors that must be faced when applying voice dialer to Pocket PCs. To provide a solution, a novel decoding method integrated with the score differences of token paths is proposed, named as “Dynamically-Adjustable Histogram Pruning”. Additionally, the computation of likelihood score is accelerated by means of dynamic score lookup table. Furthermore, a new acoustic ...
متن کاملSpeaker Diarization Based on Gmm Supervectors and Unsupervised Intra-speaker Variability Modeling
This paper presents a novel framework for speaker diarization. Audio is parameterized by a sequence of GMM-supervectors representing overlapping short segments of speech. Session dependent intra-session intra-speaker variability is estimated online in an unsupervised manner, and is removed from the supervectors using Nuisance Attribute Projection (NAP) The supervectors are then projected using ...
متن کاملProbabilistic Speaker-Class based Acoustic Modeling for Large Vocabulary Continuous Speech Recognition
In this paper, a probabilistic speaker-class (PSC) based acoustic modeling method is proposed for taking into account speaker variability influence in HMM-based speech recognition systems. Firstly, within the context of speaker-class based speech recognition, an experiment is conducted to investigate the performance of speaker-class recognition based on hard-cut speaker clustering. Then, in the...
متن کاملAugmented state space acoustic decoding for modeling local variability in speech
This paper presents a decoding method for automatic speech recognition (ASR) that reduces the impact of local spectral and temporal variabilities on ASR performance. The procedure involves augmenting the standard Viterbi search for an optimum state sequence with a locally constrained search for optimum degrees of spectral warping or temporal warping applied to individual analysis frames. It is ...
متن کاملA sub-optimal viterbi-like search for linear dynamic models classification
This paper describes a Viterbi-like decoding algorithm applied on segment-models based on linear dynamic systems (LDMs). LDMs are a promising acoustic modeling scheme which can alleviate several of the limitations of the popular Hidden Markov Models (HMMs). There are several implementations of LDMs that can be found in the literature. For our decoding experiments we consider general identifiabl...
متن کامل